Familypedia:Site schema
This describes the conceptual structure and some of the rationale for Familypedia's SMW organization and conventions. "Semantic" mediawiki has limitations, and there will be issues because it does not have semantic primitives that one might think are elementary. For example, there is no support for reflexivity. That is, setting a property "has father" implies that the other article "has child". However, this is not supported, so due to the informality of wikis, database consistency becomes an issue. For example, an article can list that another person is its father, but the father article may not list the child. SMW and automated bots can alleviate this problem somewhat, but this issue will continually reappear in various forms. Data Abstraction structures separate from Data presentation Set facts versus Showfacts templates: XML has popularized the idea of keeping data abstraction separate from data presentation. The idea is sound but comes at a cost. The typical way to declare knowledge in Wikipedia is to provide it as a parameter to a template that will immediately present the data in a uniform way- such as a navbox or infobox. SMW provides the notion of declaring knowledge within the stream of everyday text narratives. This sort of format is attractive for family historians but not so much for the purpose of genealogy where a predictable tabular format for pertinent biographical data makes scanning for information much less tedious. What Semantic forms allows is for tabular presentations such as infoboxes to have values set via a form, and for the values to also be fed into SMW after presentation. Familypedia's requirements are to be able to declare information in a regularized way via forms so that neophyte users know what questions to ask and give them the ability to create professional looking articles with a minimum of knowledge of wiki editing. Rather than tie the information to particular templates that would immediate display the information, declaration of knowledge is arranged thematically, so that alternate presentations requiring different data in particular templates does not require refactoring the data model. The cost of presenting information from SMW properties rather than directly from template parameters means that users must store the SMW properties first before being able to see them presented. This is the source of the "double save" problem. One solution is to reorganize the "set templates" so that parameters are not organized thematically, (eg all birth data in one template, death data in another...), the parameters would belong to the templates that actually used them. EG: Showfacts person would have parameter birth and death date, but not birth photos or attendees since these are not displayed by this template. A gallery template might accept the birth photos and wedding photos, but not the attendees. Information quickly becomes scattered among myriad templates and forms, and as those presentation templates evolve, so goes the fate of the information declared for them. It is a road towards data balkanization- a well worn path that familypedia shall not be treading. The group of Set templates (set families, set birth, set death...) declare data and usually have no visual UI. These templates have a one to one relationship to subforms that have a one to one relationship with each of the events or entities of a person's biography. Showfacts templates may change with time, but to give an overview of presentations, showfacts person, showfacts children, and showfacts biography are the major consumers of SMW data. Families A more tidy database schema might abstract family groups as a first class entity in its own right. However, this would have introduced multiple indirections in order to retrieve routine data about individuals, making the database inaccessible by casual template writers or end users. For this reason, the structure was simplified. Families are treated as pseudo groups as properties of parents. The cost of this is that it is possible for two parents to declare overlapping family groups. This sort of inconsistency-redundancy is easily discoverable / flaggable by SMW template software and remedied by bots. Structure- children of two parents are placed in a group. Biological children are placed in a separate group from adopted or foster children. Children with another coparent also go in a separate family group. The list of children is placed in either one coparent's article or the other, but never both. This article becomes the basepage for that family group. :Numbering- the coparent for the family group has the same number as the children list. For instance, coparent-g1 is the coparent for children-g1. Numbering has no significance and implies nothing about chronology, or importance. The article for the co-parent will list the spouse/partner in its own coparent property, but the group number will in many cases be different than that in the basepage. Rule- the group number has very little significance outside the narrow context of the coparent article. Locations Goals #Commonality of names across languages #Capability to scope queries based on knowledge of the containment hierarchy of locations. EG. that Calgary is a town is in a county is in a province (Alberta) is in a country (Canada). #Capability to autocomplete forms based on containment hierarchy. Categories versus Properties Properties would have been used entirely except that nested properties cannot be used for autocompletion. For this reason we have both a category and property tree for locations. *Sublocations: Given a location, constrain autocompletion to a list of locations within that location **The Tree: Category:Valid name of {geographic group}- {location} is used for autocompletion. Given a location eg New York, it produces a list of all locations including subcategories nested within that tree. The reason this uses a dash at the end of geographic group is that frames support variables that can be passed from the template in the form . Though this is not functional now, it should be possible to pass the supergroup into the form so that the autocompletion could be on for example Category:Valid name- locality- New York, where "New York" was set by *Super Locations: Given a location, find the locations that it is within. ** The Tree: Property: locality of subdivision and other Property:{geographic subgroup} of {geographic supergroup} allows an #ask on this property and either return the super location, for a given location or can constrain a query to a particular superlocation, or can list all sublocations of a given location. This is the preferred access mechanism for locations. The Category tree is only for autocompletion. Source of information Location properties and categories are extracted from the following templates: Template:Infobox_UK_place;Template:Infobox England traditional county;Template:Infobox England county;Template:Infobox Scotland county;Template:Infobox Scotland council area;Template:Infobox U.S. County;Template:Infobox Settlement;Template:Infobox District DE;Template:Infobox German Bundesland; Template:Infobox Australian Place;Template:Infobox Australian cadastral;Template:Infobox Indian Jurisdiction Mapping from Info Pages Differences *'Image caption, an info field of a person probably was never used. It is probably bad design to let it continue. Caption descriptions are a property of the image, not of the article. Documents all have rich needs for descriptions in all languages, and so the robust support for this should be centralized on the files and the corresponding Form:Media facts. Multilingual factors Logged out users and crawlers see pages in the default language english. When users are logged in, we can know what their language preference is and make adjustments. For this reason, there is a distinction between page language and user language. When the user language is unknown, we default to the page's language. Unfortunately, there is no way of distinguishing logged out users from users whose language of preference is english. This means that labels of tables and values will appear with localized values if the user is logged in with a non english language preference, but this convenience will not exist for those with an english language preference. For example, when visiting an english article such as Louise Henriëtte van Nassau (1627-1667), the user logged in with Spanish language preference will read the infobox table in their preferred language. Using Mediawiki message strings, the values for the table label "Birth" will appear as "Nacimiento". Using database lookups described below, all placenames with a localized string equivalent specified will be used. So, birth place will read "La Haya, Holanda Meridional, Países Bajos", rather than "The Hague, South Holland, Netherlands". How this tranformation is performed is described below. Further, the autonarrative generated by will also be in the user's prefered language, since the template has subtemplates employing spanish rather than english connective grammar, and the place and person names have been localized using the same facility described below. Pseudo namespaces by language- (.de)/(.es) etc. suffixes By convention (not by programming requirement), the main namespace is segmented informally by postpending a suffix to all names, eg (.es) for Spanish. This would allow any wikipedia name to be imported from a particular language and there be no collisions. For instance, Den Haag (.nl) vs. Den Haag (.de). Pseudonamespaces using the more conventional "es:" prefixing pattern was considered, but rejected because autocompletion in dialogs and in the editor operates on first characters, and visitors would have to know arcane language codes to make effective use of autocompletion. At the time of this writing there is no code that depends on this naming standard, but this could change since (. is a very unique sequence and some users may decide to take a shortcut around the canonical way of looking up the language via database call. How names are looked up Interlanguage lookup requires a base page where common properties are shared between languages. By convention, the basepages reside in the English pseudo namespace. Any name, whether place, person or any other article name can be looked up with a single query. The article Den Haag (.nl) is used for illustration purposes. On creation of the article, the user attempts to include smw templates on their page and is informed that the basepage is unknown, and instructed to go to the english language article on the place or person and specify the dutch name for the article. This is done via form on The Hague article, after which the dutch article name is now recorded in two properties: "article (.nl)" "articles". The article .nl property is a subproperty of "Uses smwbasepage" property, so that any query on an article in any language will always return one and only one basepagename as a result. This is how we know which page to query when we need to know information on Den Haag. The articles property allows bidirectionality. Any article may find its sibling articles in other languages by scanning the articles list. Category:facts pages documentation